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Abstract 

Low-density parity-check (LDPC) codes are one of the most promising families of codes to replace the Goppa 
codes originally used in the McEliece cryptosystem. In fact, it has been shown that by using quasi-cyclic low- 
density parity-check (QC-LDPC) codes in this system, drastic reductions in the public key size can be achieved, 
^5 ■ while maintaining fixed security levels. Recently, some proposals have appeared in the literature using codes with 

denser parity-check matrices, named moderate-density parity-check (MDPC) codes. However, the density of the 
parity-check matrices to be used in QC-LDPC code-based variants of the McEliece cryptosystem has never been 
optimized. This paper aims at filling such gap, by proposing a procedure for selecting the density of the private 
parity-check matrix, based on the security level and the decryption complexity. We provide some examples of 

"xl" ' 

' the system parameters obtained through the proposed technique. 

<N 
co 

O ■ I- Introduction 

CO 

The perspective of introducing quantum computers has driven a renewed interest towards public-key encryp- 
tion schemes which are alternative to widespread solutions, like the Rivest, Shamir, Adleman (RSA) system, 
based on the integer factorization problem. The latter, in fact, would be solved in polynomial time through 
quantum computers, and hence would no longer represent a hard problem after their advent. 

The McEliece and Niederreiter cryptosystems [1], [2], which exploit the hardness of the decoding problem 
to implement public-key cryptography, are among the most interesting alternatives to RSA. Secure instances 
of these systems are based on Goppa codes and, despite some revision of their parameters due to optimized 
cryptanalysis and increased computational power [3], they have never been seriously endangered by cryptanal- 
ysis. However, using Goppa codes has the major drawback of requiring large public keys, whose size increases 
quadratically in the security level. Several attempts to replace Goppa codes have been made during years, but 
only a few have resisted cryptanalysis. Among them, variants based on QC-LDPC codes are very promising, 
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since they achieve very small keys, with size increasing linearly in the security level. These variants are unbroken 
up to now, though some refinements have been necessary since their first proposal. 

LDPC codes are state-of-the-art iteratively decoded codes, first introduced by Gallager [4], then rediscovered 
[5] and now used in many contexts [6]. Recently, LDPC codes have also been introduced in several security- 
related contexts, like physical layer security [7]-[9] and key agreement over wireless channels [10]. LDPC 
codes were initially thought to be insecure in the McEliece cryptosystem [11], and very large codes were 
required to avoid attacks [12]. This scenario has changed when it has been shown that the permutation matrix 
used to obtain the public key from the private key could be replaced with a more general matrix [13]. Despite 
some adjustments have been necessary after the first proposal, these matrices have allowed to design secure 
and efficient instances of the system based on QC-LDPC codes [14], [15]. 

Recently, it has been shown that the use of permutation matrices, like in the original McEliece cryptosystem, 
can be restored by using codes with increased parity-check matrix density, named MDPC codes [16], [17]. 
MDPC codes also exhibit performance which does not degrade significantly when there are short cycles in 
their associated Tanner graph. This allows for a completely random code design, which has permitted to obtain 
a security reduction to the hard problem of decoding a generic linear code [16]. 

In this paper, we compare LDPC and MDPC code-based McEliece proposals and provide a procedure to 
optimize the density of the parity-check matrices of the private code, in such a way as to reach a fixed security 
level and, at the same time, keep complexity to the minimum. The paper is organized as follows: in Section 
II, we assess the error correction performance of the codes of interest, and its dependence on the parity-check 
matrix density; in Section III, we estimate the security level of the system by considering the most dangerous 
structural and local attacks; in Section IV, we show how to optimize the private parity-check matrix density by 
taking into account complexity; in Section V we provide some system design examples through the proposed 
procedure and, finally, in Section VI we draw some conclusive remarks. 

II. Error correction performance 

QC-LDPC and quasi-cyclic moderate-density parity-check (QC-MDPC) code-based variants of the McEliece 
cryptosystem use codes with length n = n -p, dimension k = k -p and redundancy r — p, where n is a small 
integer (e.g., n — 2,3,4), fc = n — 1, and p is a large integer (on the order of some thousands or more). 
The code rate is therefore n ^~ 1 - Since adopting a rather high code rate is important to reduce the encryption 
overhead on the cleartext, in this work we focus on the choice n = 4, such that the size of a cleartext is 0.75 
times that of the corresponding ciphertext. 

The private key contains a quasi-cyclic (QC) parity-check matrix having the following form [15], [18]: 

H=[H |H 1 |...|H„ _ 1 ], (1) 

where each EL is a circulant matrix with row and column weight d v . It follows that the row weight of H is 
d c = n Q d v -C n. So, the code defined by H is an LDPC code or MDPC code, according to the definition in 
[16]. Actually, the border between LDPC and MDPC codes is not tidy: MDPC codes are LDPC codes too, but 
their parity-check matrix density is not optimal, in regard to the error rate performance. 

The private key also contains two other matrices: a ft x fc non singular scrambling matrix S and an n x n 
non singular transformation matrix Q having average row and column weight m. For the sake of simplicity, 
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m was always chosen as an integer in previous proposals [14], [15], and Q was a regular matrix. However, Q 
can also be slightly irregular, in such a way that m can be rational. This provides a further degree of freedom 
in the design of the system parameters, which will be exploited in this paper. 

Let G be the private code generator matrix, the public key is obtained as G' = S -1 • G • Q 1 for the McEliece 
cryptosystem and as H' = S _1 • H • Q T for the Niederreiter version [19]. In order to preserve the QC nature of 
the public keys, the matrices S and Q are also chosen to be QC, that is, formed by fco x fco and tiq x no circulant 
blocks, respectively. This way, and by using a suitable CCA2 secure conversion of the system [3], which allows 
using public keys in systematic form, the public key size becomes equal to fco • (no — fco) -p — (no — l)-p bits, 
which is very small compared to Goppa code-based instances. On the other hand, the use of Q in QC form 
limits the resolution of m, which cannot vary by less than 1/tiq, but this is not an important limitation in the 
present context. When using MDPC codes, the matrix Q reduces to a permutation matrix P (i.e., m — 1). In 
this case, by using a CCA2 secure conversion of the system, S and P can be eliminated [16], since the public 
generator matrix can be in systematic form and G can be directly used as the public key. In fact, differently 
from Goppa codes, when using MDPC codes, exposing G does not allow an attacker to perform efficient 
decoding. 

Though the public matrices are dense, the public code admits a valid parity-check matrix in the form H' = 
H • Q T , which, due to the sparse nature of both H and Q, has column and row weight approximately equal to 
d' v = md v and d' c = md c , respectively. The matrix Q has also effect on the intentional error vectors used for 
encryption, since if Alice adds t intentional errors for encrypting a message, then Bob must be able to correct 
up to t' = rat errors to decrypt it [15]. 

Concerning the error correction performance of the private code, though for LDPC codes its evaluation 
without simulations is in general a hard task, we can get a reasonable estimate by computing the bit flipping (BF) 
decoding threshold [15]. We have computed this threshold, for n = 4, by considering a fixed and optimized 
decision threshold for the BF decoder, and letting p vary between 2 12 and 2 14 . Since we are interested in 
studying the dependence of the BF threshold on the parity-check matrix density, we computed such a threshold 
for different column weights (d v ) ranging between 13 and 77. The results obtained are reported in Fig. 1. We 
observe that the decoding threshold, so estimated, increases linearly in the code length, and generally decreases 
for increasing parity-check matrix densities, though with some local oscillations. 

Actually, the BF threshold represents the waterfall threshold when using BF decoding on an infinite-length 
code without cycles in the Tanner graph, and hence it does not correspond to sufficiently low error rates when 
such a decoding algorithm is used on finite-length codes. However, several variations and improvements of the 
BF algorithm have been proposed for decoding LDPC codes, and they actually provide very low, and even 
negligible, residual error rates when the number of errors equals, or slightly overcomes, the BF threshold [15]. 
Even better performance can be achieved by using LDPC decoding algorithms based on soft decision, like the 
sum product algorithm (SPA). Thus, for these codes, we can actually use the BF threshold as a measure of 
the number of errors that can be corrected with very high probability. An example in this sense is provided in 
Fig. 2, where the error correcting performance achieved by eight QC-LDPC codes with n = 4, p = 4096 and 
d v = 13 through SPA decoding is reported. The residual bit error rate (BER) and codeword error rate (CER) 
after decoding have been assessed through simulation. According to Fig. 1, the BF threshold for these codes 
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Fig. 1. BF decoding threshold as a function of the code length for no = 4 and several parity-check matrix column weights (d v ). 



is 181 errors, and Fig. 2 confirms that it provides a conservative estimate of the number of correctable errors. 

The same conclusion does not seem to be valid for MDPC codes, especially for high no values. As an 
example, we have considered a code with no = 4, n = 25088 and d v = 85. Its BF threshold is at 77 errors; 
however, we have verified through simulations that, with 68 intentional errors, the SPA achieves a residual 
CER of about 4 • 1CP 3 . This result can be improved by resorting to BF decoding. In fact, for MDPC codes, 
which have many short cycles in their Tanner graphs, using soft information may result in worse performance 
than using good hard-decision decoding algorithms. For example, the BF decoder with variable and optimized 
decision thresholds is able to reach a residual CER of about 1.5 • 10~ 5 . However, these residual error rates 
confirm that, for MDPC codes, the BF threshold may overestimate the number of correctable errors. 

From Fig. 2 we also get another important information. The first four codes considered (denoted by rand,, i = 
1, . . . ,4) were designed completely at random, that is, by randomly choosing the positions of the 13 ones in 
the first row of each circulant block. The second four codes considered (denoted by RDFj, i = 1, . . . , 4) were 
instead designed by using random difference families (RDF) [13]. 

From the figure we observe that no significant difference appears between the two sets of curves. These codes 
have the lowest parity-check matrix density among those considered, that is, d v = 13. A similar behavior was 
observed in [16] for MDPC codes with d v on the order of 45 or more. This suggests that, for the parity- 
check matrix densities that are of interest for this kind of applications, there is no substantial difference 
between completely random and constrained random code designs. A difference would instead appear for 
sparser matrices, like those of interest for application of LDPC codes to transmissions (that is, with d v on the 
order of some units), for which short cycles in the Tanner graph deteriorate the code minimum distance. Hence, 
it is reasonable to conclude that a completely random code design can be used in this context, independently 
of the parity-check matrix density of the private code. Therefore, the security reduction provided in [16] also 
applies to LDPC code-based variants of the McEliece cryptosystem, similarly to those using MDPC codes. 
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Fig. 2. Simulated SPA decoding performance (BER and CER) for completely random and RDF-based codes with no = 4, p = 4096 
and d v = 13. 



III. Security level 

The most dangerous attacks against the considered systems are dual code attacks (DCA) and information 
set decoding attacks (ISDA) [15]. In order to estimate the work factor (WF) of these attacks, we consider 
the algorithm proposed in [20] to search for low weight codewords in a random linear code. Actually, some 
advances have recently appeared in the literature concerning decoding of binary random linear codes [21], 
[22]. However, these works are more focused on asymptotic evaluations rather than on actual operation counts, 
which are needed for our WF estimations. Also "ball collision decoding", proposed in [23], achieves important 
WF reductions asymptotically, but these reductions are negligible for the considered code lengths and security 
levels. 

DCA aim at obtaining the private key from the public key by searching for low weight codewords in the 

dual of the public code. This way, an attacker could find the rows of H', and then use H', which is sparse, 

to decode the public code through LDPC decoding algorithms. The row weight of H' is d' c = nod' v and the 

corresponding multiplicity is r = p. Figure 3 reports the values of the WF of DCA, as functions of d' v , for the 

shortest and the longest code lengths here considered. We observe that, for a fixed d' v , the two curves differ by 

less than 2 4 , hence DCA exhibit a weak dependence on n. 

ISDA instead aim at finding the error vector e affecting an intercepted ciphertext. This can be done by 

G' 

searching for the minimum weight codewords of the extended code generated by G" = 

is facilitated by the QC nature of the codes we consider, since each block-wise cyclically 
an intercepted ciphertext is another valid ciphertext. Hence, G" can be further extended by adding block- 
wise shifted versions of the intercepted ciphertext, and the attacker can search for one among as many shifted 
versions of the error vector. We have considered the optimum number of shifted ciphertexts that can be used by 
an attacker, and computed the WF of ISDA according to the above procedure. The results obtained are reported 
in Fig. 4, as functions of the number of intentional errors, for the smallest and the largest code lengths here 
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Fig. 4. ISDA WF (log 2 ) as a function of the number of intentional errors, for no = 4 and p = 4096, 16384. 



considered. Also in this case, we observe that the WF of the attack has a weak dependence on the code length. 

From Fig. 4 we also observe that the ISDA WF (in log 2 ) increases linearly in the number of intentional 
errors, and we know from Fig. 1 that the decoding threshold increases linearly in the code length. Hence, 
provided that d' v is chosen in such a way that DCA have WF equal to or higher than ISDA, the security level 
of the system increases linearly in the code length, which is a desirable feature for any cryptosystem. 

IV. Density optimization 

Some features of the McEliece cryptosystem variants we study are not affected by the private parity-check 
matrix density. One of them is the key size. In fact, the public key is always a dense matrix and, hence, its 
size does not change between LDPC and MDPC code-based variants. The public key size can be reduced to 
the minimum by using n = 2, as in [16], but this reduces the code rate to 1/2, which is less than in the 
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original McEliece cryptosystem and its most recent variants. We instead consider no = 4, which gives slightly 
larger keys, but also a more sensible code rate. In fact, due to the QC nature of the public matrices, the public 
key size remains very small, and increases linearly in the code length, that is, for the considered cryptosystem, 
in the security level. Some examples of key size can be found in [14]— [16], both for classical cryptosystem 
versions and CCA2 secure conversions. 

Also the encryption complexity is not affected by the private matrix density, since encryption is performed 
through the dense public matrix. Concerning decryption, the following steps must be performed to decrypt a 
ciphertext [15]: 

i) multiplication of the ciphertext by Q; 

ii) LDPC decoding; 

iii) multiplication of the decoded information word by S. 

The last step is not affected by the private parity-check matrix density, while the complexity of the first two 
steps depends on it. More specifically, the matrix Q is sparse, hence the cost of step i) is proportional to its 
average column weight (to). Since, once having fixed d' v according to the desired security level against DCA, 
m equals d' v /d v , complexity depends on the private code parity-check matrix density. 

LDPC decoding is performed through iterative algorithms working on the code Tanner graph, which has a 
number of edges equal to the number of ones in the code parity-check matrix. Hence, for a given d' v , the choice 
of m and d v represents a tradeoff between complexity of the steps i) and ii): increasing d v (and decreasing to, 
at most down to 1, as in MDPC code-based variants) decreases the complexity of the step i) and increases that 
of the step ii), while increasing to (and decreasing d v , as in [14], [15]) increases the complexity of the step i) 
and decreases that of the step ii). 

In order to assess this tradeoff, we define two compact complexity metrics for steps i) and ii): nra is the 
number of operations needed to perform multiplication of a vector by Q and nd v I, where / is the average 
number of decoding iterations, is proportional to the number of operations needed to perform LDPC decoding. 
In order to provide the actual count of binary operations, the latter should be further multiplied by the number 
of binary operations (a) performed along each edge of the Tanner graph. However, this quantity depends on 
the specific decoding algorithm used. In order to keep our analysis as general as possible, we first consider 
a = 1, and we will comment on the effect of higher values of a later on. 

Since d v = d' v /m, optimizing the tradeoff between steps i) and ii) reduces to choosing to which minimizes: 

C(m) = n^-I + nm. (2) 

to 

This must be performed by considering a value of d' v able to guarantee sufficient security against DCA (see 
Fig. 3) and a value of n such that the code is able to correct t' = rat errors, where t is chosen in such a way 
as to reach a sufficient security level against ISDA (see Fig. 4). 

We observe that the minimum of (2) corresponds to to' — \Jd' v l. However, for to = to', the private code 
might be unable to correct all rat errors, hence a smaller value of m might be necessary. In addition, a high value 
of to implies a small d v and, if d v becomes too small, the private parity-check matrix could be discovered by 
enumeration. On the other hand, by decreasing to below to', the value of (2) increases, and reaches a maximum 
for to = 1, which is the minimum to allowed to have a non singular matrix Q. Based on these considerations, 
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we can conclude that the optimum value of m is always greater than 1, and comprised between 1 and to'. By 
considering a more sensible value of a > 1, to' would further increase. However, this would have no effect on 
the actual optimal value of to, which, for the system parameters that are of practical interest, always remains 
below \Jd! v l. 

Finally, we also observe that a low value of to also affects the total number of different matrices which can 
be chosen as Q. When to = 1, the matrix Q becomes a QC permutation matrix P, that is, a matrix formed 
by no x no circulant blocks with size p, among which only one block per row and per column is a circulant 
permutation matrix, while all the other blocks are null. Hence, the total number of different choices for P is 
p n °nol. For example, by considering the parameters proposed in [16] for achieving 80-bit security, which are 
[p = 4800, n = 2), (p = 3584, n Q = 3) and (p = 3072, n Q = 4), we would have, respectively, 2 25 46 , 2 38 01 
and 2 37 34 different choices for P, which would be too few to guarantee security. However, this weakness can 
be avoided by resorting to a CCA2 secure conversion of the system, and hence eliminating S and P, as pointed 
out in [16]. On the other hand, when using higher values of to, this potential weakness can easily be avoided, 
just for moderately high values of n (like n = 3, 4), as needed for achieving high code rates. 

V. Design examples 

We first consider the target of 100-bit security. According to Figs. 3 and 4 (and assuming the shortest code 
length there considered, which provides a conservative estimate), this can be achieved, with n = 4, by choosing 
d' v = 59 and t = 47. An MDPC code with length n = 16384 and d v = 59 has a BF threshold equal to 68 
errors, and we have verified that it is actually able to correct 47 errors with very high probability. Hence these 
parameters provide a 100-bit security system design with m = 1. Instead, if we fix d v = 15 (that is, m = 3.93), 
we have t' — 185. From Fig. 1 it results that an LDPC code with d v = 15 and n = 16384 has a BF threshold 
equal to 187 errors, and we have shown in Section II that, for such sparse codes, the BF threshold actually 
provides a conservative estimate of the number of correctable errors. So, we have two system designs which 
achieve the same security level, but with different matrix densities. In these two cases, and by considering that 
a typical value of / is 10, we have C(l) = 2 23 21 and C(3.93) = 2 21 - 27 . 

As another example, we consider a 128-bit security level. Similarly to the previous case, from Figs. 3 and 
4 we obtain that this requires d' v = 77 and t = 62. An MDPC code-based design can be obtained with code 
length n = 28672 (and d v = 77), which provides a BF threshold equal to 98 errors. We have verified that 
such an MDPC code is actually able to correct 62 errors with very high probability, hence this solution reaches 
128-bit security with m = 1. An LDPC code-based alternative can be obtained by using the same code length 
and d v = 15, that is, m = 5.13. In this case, the BF threshold is equal to 327 errors, hence the code is 
able to correct all the t' — 318 errors with very high probability. In these cases (and with / = 10), we have 
(7(1) = 2 24 40 and C(5.13) = 2 22 09 . 

These examples confirm that, for a fixed security level, choosing sparser codes, and hence higher values of 
to, is advantageous from the complexity viewpoint. 

VI. Conclusion 

In this paper, we have analyzed the choice of the private parity-check matrix density in QC-LDPC code- 
based variants of the McEliece cryptosystem. We have shown that a given security level can be achieved by a 
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balancing of the density of the private parity-check matrix H and that of the matrix Q used to disguise H into 
the public key. 

Through some practical examples, we have shown that, from the complexity standpoint, it is generally 
preferable to decrease the density of the private parity-check matrix and to increase that of the transformation 
matrix Q. For this reason, LDPC code-based instances of the system result to be preferable to MDPC code-based 
instances if one wishes to keep complexity at its minimum, for a fixed security level. 
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